fix(setup): durable retry for external dep downloads (Aaron 2026-04-29)#804
Merged
fix(setup): durable retry for external dep downloads (Aaron 2026-04-29)#804
Conversation
Aaron's mid-tick correction (verbatim, typos preserved):
"we can retury on external dependency download failures,
this goes against DST but we have not choice they are
external dependencies we need. Next time instead of
kicking a 2nd build we should fix it and reduce friction
for future builds."
Earlier in the session I recovered an elan-toolchain 502 by
running `gh run rerun --failed`. Aaron's correction names that
as the wrong fix LOCATION: the rerun made THIS build pass but
did nothing for FUTURE builds hitting the same upstream blip.
The durable fix lives in the code: tools/setup/common/curl-fetch.sh
already provides curl_fetch (file-output, --retry 5
--retry-delay 2 --retry-all-errors) per Aaron's 2026-04-28
framing. Two scripts bypassed it with raw `curl -fsSL`:
- tools/setup/linux.sh:87 (mise tarball download)
- tools/setup/common/elan.sh:29 (elan-init.sh download)
Both migrated to curl_fetch. linux.sh adds the source line;
elan.sh runs as a subprocess from linux.sh + macos.sh so it
must source curl-fetch.sh itself (a sourced helper in the
parent shell does NOT propagate to subprocess shells — adds
the REPO_ROOT detection + source).
The new memory file
memory/feedback_external_dependency_download_retries_durable_fix_not_ephemeral_rerun_aaron_2026_04_29.md
captures the rule for future-Claude:
- external dep downloads ARE the DST exception class
- fix LOCATION matters: durable in code > ephemeral rerun
- when CI hits a transient external-dep failure, FIRST
check whether the call site uses curl_fetch (or
equivalent retry-equipped helper); if not, THAT is the
durable fix; only fall back to rerun for genuine
one-shot blips that retry-equipped code already
exhausted
Composes with:
- memory/feedback_retries_are_non_determinism_smell_DST_holds_investigate_first_2026_04_23.md
(parent DST rule; this entry refines the
"external uncontrollable" exception with concrete domain
+ fix-location discipline)
- tools/setup/common/curl-fetch.sh (existing helper; this
work migrates the call sites that were bypassing it)
MEMORY.md updated paired-edit per the index-integrity rule.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Merged
3 tasks
There was a problem hiding this comment.
Pull request overview
This PR makes external dependency downloads in the setup scripts more resilient by routing them through the existing retry-capable curl_fetch helper, and records the operational rule that retries are acceptable at the external-dependency boundary when implemented durably in-code (not via workflow reruns).
Changes:
- Source
tools/setup/common/curl-fetch.shin Linux setup and usecurl_fetchfor the mise tarball download. - Make
elan.shself-sufficient (repo-root detection + sourcingcurl-fetch.sh) and usecurl_fetchforelan-init.sh. - Add a new memory entry documenting “durable retry in code” vs “ephemeral rerun” discipline and index it in
memory/MEMORY.md.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| tools/setup/linux.sh | Sources curl-fetch.sh and replaces a raw curl -fsSL download with curl_fetch. |
| tools/setup/common/elan.sh | Adds repo-root detection + sources curl-fetch.sh locally; replaces raw curl with curl_fetch. |
| memory/feedback_external_dependency_download_retries_durable_fix_not_ephemeral_rerun_aaron_2026_04_29.md | New memory/rule writeup describing the “durable retry in code” exception and fix-location discipline. |
| memory/MEMORY.md | Adds the new memory entry to the index. |
…ry-flag qualifier + line-number drift) Four Copilot findings on PR #804 addressed: P1 (linux.sh): named-attribution "Aaron 2026-04-28/29" removed from script comments (current-state code surface uses role-refs; names go on history surfaces per docs/AGENT-BEST-PRACTICES.md §named-attribution-carve-out). Reframed as "DST exception" + "a workflow rerun" without naming. P1 (elan.sh): same named-attribution rewrite. P2 (linux.sh): comment overstated curl_fetch's retry-flag set as unconditional. Clarified that --retry-all-errors is added when the local curl supports it (curl-fetch.sh feature-detects via `_curl_fetch_supports_retry_all_errors`). P1 (memory file): hardcoded line numbers (elan.sh:29, linux.sh:87) drift with each edit. Replaced with stable anchors (URL constant names + grep instructions). Diff illustration kept but caveated. History-surface attribution (memory file body, commit messages, PR descriptions) preserves the named-channel quote per the verbatim-preservation rule. Code-surface comments now use role-refs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 tasks
AceHack
added a commit
that referenced
this pull request
Apr 29, 2026
…y correction (#804) (#805) * chore(loop-tick-history): tick 05:50Z — drain (3 PRs) + Aaron mid-tick durable-retry correction (#804) 3 CLEAN PRs squash-merged (#801/#802/#803). Aaron's /btw aside answered (EAT document location). Mid-tick correction caught the elan-toolchain-502 rerun anti-pattern from earlier this session; durable fix landed in PR #804 (memory file + linux.sh + elan.sh migrated to curl_fetch). First non-tick-history substrate this session. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(loop-tick-history): tick 0550Z — Copilot P2 fixes (memory/ prefix + explicit backlog paths) PR #805 review thread from copilot-pull-request-reviewer: 1. Cross-reference style: memory file path needs `memory/` prefix (consistent with other shards). 2. "pending task #307" reads ambiguously (looks like a GH PR number). Replace with explicit backlog row paths (B-0062 + B-0074 full paths). Both fixes applied; row content unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2 tasks
AceHack
added a commit
that referenced
this pull request
Apr 29, 2026
pending CI (#807) Two CLEAN PRs merged. PR #804 (durable-retry fix) is now on main — future external-dep download flakes absorbed via curl_fetch retry. PR #806 (multi-AI feedback absorb bundle) waiting on CI — real- dependency wait, not manufactured patience. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
Apr 29, 2026
…+ tick 0558Z shard
The human maintainer forwarded a multi-AI synthesis packet during
autonomous-loop tick 05:58Z:
- Deepseek's reassessment + 5-point pushback after correcting
earlier-incorrect search results
- Amara's filter-to-actionables (6 bounded items, with
"rerun is incident recovery; retry/cache is substrate
improvement" elevated as the load-bearing line)
- reference to an older Gemini log on tele+port+leap operational
resonance (already canonical at
memory/feedback_operational_resonance_*.md; not re-absorbed)
Verbatim absorb (per the channel-verbatim-preservation rule)
landed at:
docs/research/multi-ai-feedback-2026-04-29-deepseek-amara-on-loop-state.md
with §33 archive header (Scope / Attribution / Operational status /
Non-fusion disclaimer).
Four small P3 backlog rows file the bounded actionables (per
the maintainer's existing narrowing on task #309 — research-
grade only, no broad new substrate PRs):
B-0098 — tick-ordinal-continuity lint (or remove ordinals
from shards entirely; computed > narrated)
B-0099 — PR-count claims as derived metrics, not narrated prose
B-0100 — pure-wait tick backpressure / quiescence rule
B-0101 — small 5-bucket reviewer-artifact classification table
The 5th actionable (external-dep retry/cache) is already
addressed by PR #804 (durable-retry fix landed alongside the
"rerun-is-recovery / retry-is-substrate-improvement" rule).
The 6th actionable (evidence-claim language tightening) is
operational discipline, captured in the tick shard observation
column.
Tick shard at docs/hygiene-history/ticks/2026/04/29/0558Z.md
captures the work-stream summary.
Pattern: ONE consolidated PR for the absorb bundle (research
note + 4 backlog rows + tick shard) rather than 6 separate PRs,
honoring the maintainer's "don't open a bunch of new PRs"
narrowing while still preserving the verbatim record durably.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
Apr 29, 2026
…age (B-0101 taxonomy applied) + rebase (#810) (1) PR #808 squash-merged. (2) #806's 5 unresolved threads classified per B-0101 taxonomy: - 1 DISPLAY_ARTIFACT (Copilot's "|| " excerpt was hallucinated; actual file has correct table) - 3 REVIEWER_SNAPSHOT_LAG (memory file IS on main post-#804 merge) - 1 REAL_DEFECT already fixed in prior tick All 5 resolved with explanatory comment. (3) #806 branch 4 commits behind main → rebased + force-pushed; CI recomputing. First operational use of the B-0101 taxonomy — one tick after filing it. The pattern: file the rule, validate it on the next class instance. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
Apr 29, 2026
…ck 0558Z (re-open of #806) (#811) * absorb: multi-AI feedback packet (Deepseek + Amara) + 4 backlog rows + tick 0558Z shard The human maintainer forwarded a multi-AI synthesis packet during autonomous-loop tick 05:58Z: - Deepseek's reassessment + 5-point pushback after correcting earlier-incorrect search results - Amara's filter-to-actionables (6 bounded items, with "rerun is incident recovery; retry/cache is substrate improvement" elevated as the load-bearing line) - reference to an older Gemini log on tele+port+leap operational resonance (already canonical at memory/feedback_operational_resonance_*.md; not re-absorbed) Verbatim absorb (per the channel-verbatim-preservation rule) landed at: docs/research/multi-ai-feedback-2026-04-29-deepseek-amara-on-loop-state.md with §33 archive header (Scope / Attribution / Operational status / Non-fusion disclaimer). Four small P3 backlog rows file the bounded actionables (per the maintainer's existing narrowing on task #309 — research- grade only, no broad new substrate PRs): B-0098 — tick-ordinal-continuity lint (or remove ordinals from shards entirely; computed > narrated) B-0099 — PR-count claims as derived metrics, not narrated prose B-0100 — pure-wait tick backpressure / quiescence rule B-0101 — small 5-bucket reviewer-artifact classification table The 5th actionable (external-dep retry/cache) is already addressed by PR #804 (durable-retry fix landed alongside the "rerun-is-recovery / retry-is-substrate-improvement" rule). The 6th actionable (evidence-claim language tightening) is operational discipline, captured in the tick shard observation column. Tick shard at docs/hygiene-history/ticks/2026/04/29/0558Z.md captures the work-stream summary. Pattern: ONE consolidated PR for the absorb bundle (research note + 4 backlog rows + tick shard) rather than 6 separate PRs, honoring the maintainer's "don't open a bunch of new PRs" narrowing while still preserving the verbatim record durably. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(B-0101): markdownlint MD022 — collapse split heading to single line PR #806 markdownlint failure: heading was wrapped across two `##` lines, which markdownlint reads as two separate headings without blanks between them (MD022/blanks-around-headings). Collapsed to single line. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(B-0098/B-0099): Copilot P1 — portable grep boundary + explicit gh-login placeholder Two real defects on PR #811: B-0098 — \b is non-portable in POSIX ERE (grep -E). On GNU/BSD grep \b is treated as backspace/undefined. Replaced with -w (whole-word match, supported on both GNU and BSD grep) and added a comment documenting why. B-0099 — `@me` reads ambiguously in pseudocode (looks like a literal token even though it IS valid GitHub search syntax for the authenticated user). Replaced with explicit `<gh-login>` placeholder + a clarifying note that `@me` also works. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(B-0098/B-0099): convergent reviewer corrections (round 3) Three external reviewers (Amara, Claude.ai, Deepseek) flagged two precision issues on PR #811: B-0098 — `grep -w` is GNU/BSD-common but not strictly POSIX- portable. Replace single-claim wording with two viable patterns: (a) `grep -woE` (GNU/BSD-common) and (b) strict portable explicit-boundary pattern. Implementing contributor picks based on portability priority. B-0099 — `author:@me` inside `--search` reads ambiguously and is not the documented CLI shape. Replace with `gh pr list --author` CLI flag, with both `<your-gh-login>` (explicit, preferred for cold readability) and `@me` (valid CLI shorthand) shown. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(B-0101): split SNAPSHOT_MISMATCH into backward-stale + forward-dependent (Amara round-3+round-4) Amara's correction: REVIEWER_SNAPSHOT_LAG was too broad — covered both temporal directions. Split into: SNAPSHOT_MISMATCH (parent) ├─ BACKWARD_STALE_SNAPSHOT — reviewer behind reality └─ FORWARD_CROSS_PR_REFERENCE — PR references sibling work not yet on base; valid only IF merge order is enforced Same family, different remedies. Backward = verify-and- resolve. Forward = encode dependency + don't resolve as "valid post-merge" unless mechanically enforced. Distilled rule (Amara): "A forward reference is not wrong if the dependency is enforced. A forward reference is wrong if the dependency is only hoped." Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(B-0098): Codex P1 — strict-POSIX example must use only POSIX features Codex correctly flagged that my "strict POSIX-portable" example used bash-only features: - [[ ]] (bash test, not POSIX) - [[ a == *b* ]] (bash glob match, not POSIX) - $(ls -1 ... | sort) for iteration Replaced with strict POSIX: - direct glob iteration `for file in pattern; do` - `case ... in pattern) ;; esac` for glob match - `printf` instead of `warn` (warn is shell function, not POSIX) - redirect to stderr (>&2) Option (a) keeps bashisms since it's labeled "GNU/BSD-common" (works on every 4-shell target including bash + zsh on realistic systems). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Aaron's mid-tick correction during autonomous-loop tick 05:50Z, verbatim (typos preserved per the verbatim-preservation rule):
Earlier in this same session I recovered an elan-toolchain 502 by running
gh run rerun --failed. Aaron's correction names that as the wrong fix LOCATION:gh run rerun --failedonly papers over THIS build; same flake fails next time.curl_fetch--retry 5 intools/setup/) absorbs future flakes too.Changes
memory/feedback_external_dependency_download_retries_durable_fix_not_ephemeral_rerun_aaron_2026_04_29.md(new) — rule for future-Claude: external dep downloads ARE the DST exception class; fix LOCATION matters; when CI hits a transient external-dep failure, FIRST check whether the call site usescurl_fetch; if not, THAT is the durable fix.tools/setup/linux.sh— sourcescurl-fetch.sh; replaces rawcurl -fsSLfor the mise tarball download withcurl_fetch.tools/setup/common/elan.sh— adds REPO_ROOT detection + sourcescurl-fetch.sh(elan.sh runs as a subprocess from linux.sh + macos.sh — sourced helpers don't propagate to subprocess shells); replaces rawcurl -fsSLfor the elan-init.sh download withcurl_fetch.memory/MEMORY.md— paired-edit pointer row per index-integrity rule.Composes with
memory/feedback_retries_are_non_determinism_smell_DST_holds_investigate_first_2026_04_23.md— parent DST rule; this entry refines the "external uncontrollable" exception with concrete domain + fix-location discipline.tools/setup/common/curl-fetch.sh— existing retry-equipped helper (per Aaron 2026-04-28 framing). This work migrates the call sites that were bypassing it.Test plan
bash -nsyntax check on all 3 modified shell scripts (linux.sh / elan.sh / curl-fetch.sh)🤖 Generated with Claude Code